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Explaining the origin of viruses 
remains an important challenge for 
evolutionary biology. Previous explana- 
tory frameworks described viruses as 
founders of cellular life, as parasitic 
reductive products of ancient cellu- 
lar organisms or as escapees of modern 
genomes. Each of these frameworks 
endow viruses with distinct molecular, 
cellular, dynamic and emergent prop- 
erties that carry broad and important 
implications for many disciplines, includ- 
ing biology, ecology and epidemiology. 
In a recent genome-wide structural phy- 
logenomic analysis, we have shown that 
large-to-medium-sized viruses coevolved 
with cellular ancestors and have chosen 
the evolutionary reductive route. Here 
we interpret these results and provide a 
parsimonious hypothesis for the origin 
of viruses that is supported by molecu- 
lar data and objective evolutionary bio- 
informatic approaches. Results suggest 
two important phases in the evolution of 
viruses: (1) origin from primordial cells 
and coexistence with cellular ancestors 
and (2) prolonged pressure of genome 
reduction and relatively late adaptation 
to the parasitic lifestyle once virions 
and diversified cellular life took over the 
planet. Under this evolutionary model, 
new viral lineages can evolve from exist- 
ing cellular parasites and enhance the 
diversity of the world's virosphere. 

The Virus Problem 

Viruses are intriguing biological entities 
that are borderline between inanimate 
and living matter. They have RNA- or 



DNA-based genomes with single- and 
double-stranded nucleic acids, but lack 
functional translation machinery respon- 
sible for protein synthesis, including 
ribosomes, and their own metabolism. 
Consequently, they require a host to repli- 
cate and spread as viral particles (virions) 
in large numbers populating the lands and 
the seas. They often integrate into cellular 
genomes and massively enrich the genetic 
repository of numerous organisms, includ- 
ing animals, plants and fungi. 1 They also 
cause important diseases and are economi- 
cally relevant. Viruses are believed to have 
played important roles in the evolution of 
cellular organisms (hereinafter referred 
to as cells). 2 " 4 Despite their remarkable 
abundance in marine environments (-10 9 
bacteriophages/L and > 50 genotypes/L) 5 " 8 
and puzzling diversity (numerous mor- 
phological forms and replication strat- 
egies),' viruses, in general, have been 
excluded from phylogenetic and phyloge- 
nomic studies. 10 " 14 Many scientists support 
viral exclusion based on their minute size, 
parasitic nature, lack of metabolic activity 
and inability to self-replicate. 15 For these 
and other reasons (see ref. 15), viruses are 
considered unworthy of living status and 
their placement alongside cells in the "tree 
of life" (ToL) unwarranted. Unfortunately 
and unlike cells, viruses leave no fos- 
sil records. Their evolutionary trajec- 
tories must therefore be deduced from 
extant viral features, a proposition that 
is problematic. Historically, the question 
about the origin of viruses and life itself 
remains for the most part a philosophical 
debate and largely dealt with theoretical 
arguments rather than molecular data, 
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Figure 1. Three general frameworks to 
explain the origin of viruses. Many alterna- 
tives are possible within each hypothetical 
framework but are not made explicit in the 
diagrams. Virospheres are illustrated with 
clouds. We note that they can be physically 
linked but functionally disjoint. A, Archaea; B, 
Bacteria; E, Eukarya. 



especially because viral genomic reper- 
toires are limited and patchy. 

Prevalent Views About the Origin 
of Viruses and Their Evolutionary 
Roles 

Three general theories have been proposed 
to explain the origin of viruses 4 (Fig. 1). 
The "virus-first" hypothesis states that 
viruses predated cells and contributed to 
the rise of cellular life. 2,3 A significant pro- 
portion of all the viral genomes encode for 
genetic sequences that lack clear cellular 
homologs. Presence of such virus-specific 
sequences provides support to their unique 
origin. 2,3 Contrastingly, all known viruses 
need a cellular host to replicate, thus 



necessitating the existence of cells before 
virus survival. 4 Therefore, the virus-first 
hypothesis has been challenged and the 
existence of an ancient and independent 
viral world critiqued. An alternative gen- 
eral hypothesis associates the origin of 
viruses to cells and considers viruses to 
be the reduced forms of parasitic organ- 
isms. 16 This hypothesis, better known as 
the "reduction hypothesis," is supported 
by the recent discovery of giant viruses 
(e.g., mimiviruses and megaviruses) 17-19 
with genomic and physical features that 
overlap those of numerous parasitic bac- 
teria. A third prevalent hypothesis, the 
"escape hypothesis" suggests that viruses 
were once part of the genetic material of 
host cells but escaped cell control and later 
evolved by pickpocketing genes via hori- 
zontal gene transfer (HGT) (reviewed in 
refs. 2—4). HGT is believed by some scien- 
tists to be the predominant force shaping 
many viral genomes. 15,20 This hypothesis, 
however, fails to explain the presence of 
structures that are unique to viruses and 
are not present in cells. 3,4 ' 21 

Despite disagreements, viruses are con- 
sidered to be key contributors to the evolu- 
tion of cells. Viruses, for example, could 
have mediated the evolutionary transi- 
tion from RNA to DNA. 22 Adding to the 
already expanding roles of viruses, Patrick 
Forterre also proposed the "virocell" con- 
cept that links virions and cells. 23 While 
virions are protein-encapsidated infectious 
particles that contain the viral genome, 
virocells are bona fide cells that are under 
virus control have the potential to actively 
produce virions. In contrast, ribocells 
represent cells that require ribosomes to 
actively function and divide. 23 Virions 
and ribocells engage in dynamic life cycles 
specific to organismal groups while ribo- 
somes and ribocells are part of a stable and 
tightly integrated universal system. These 
properties restrict evolutionary outcomes. 

Structural Phylogenomics 
Reveals the Ancient Cellular 
Origin of Viruses 

Hypotheses of viral origin have been hotly 
debated and contested. 2 " 4,15,24 " 28 Since none 
have full explanatory power, it is likely 
that a composite explanation may be more 
accurate. The discovery of mimiviruses 



and megaviruses, which mimic many 
parasitic cellular organisms and contain 
a partial translational apparatus, includ- 
ing several aminoacyl-tRNA synthetases 
(aaRSs) that are apparently functional, 17 " 19 
now challenges the boundaries between 
cells and viruses. The discovery of giant 
viruses calls for the inclusion of viruses 
(at least those with larger genomes) into 
global phylogenetic studies. 3,29 " 31 In a 
recent breakthrough phylogenomic study, 
we used a census of protein domain struc- 
tures in over a thousand genomes to study 
the origin of giant viruses. 32 Remarkably, 
viruses appear alongside with cells on a 
comparable evolutionary time scale and 
form a basal and distinct "supergroup" 
in a truly universal ToL. The phyloge- 
nomic analysis also produced network 
trees portraying universal ToLs very 
much alike those reconstructed in the 
past. 12 " 14 However, a distinct and unified 
viral supergroup was present at the base 
of the ToL before the emergence of super- 
kingdoms, suggesting an ancient origin 
of giant viruses. To our knowledge, this 
is the first exercise that makes extensive 
and global use of molecular data to study 
viral evolution. In this study, we pur- 
posely sampled only dsDNA viruses that 
have large-to-medium genomes and are 
quite complex. 33,34 Their large proteomic 
makeup makes the sampling of viral 
domain structures comparable to cells. 

What is the benefit of focusing on 
structure? Recent advancements in 
genomics and structural biology offer 
a wealth of molecular information that 
can be coupled with standard evolution- 
ary bioinformatic tools to test alternative 
evolutionary models. However, it is cru- 
cial that the right molecular feature and 
approach be employed when studying 
deep evolutionary relationships. Inferring 
molecular phylogenies (statements of evo- 
lution) using protein domain structures 
has been shown in a number of studies 
to successfully recover reliable phylo- 
genetic signatures. 12 " 14 Protein domains 
grouped into fold families (FFs, domains 
with high sequence conservation) and fold 
superfamilies (FSFs, domains with struc- 
tural and functional evidence of common 
ancestry) are clearly useful study subjects 
for global phylogenomic analyses. 14,35,36 
These protein fold structures are more 
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Figure 2. Evolution of the protein world. The diagram, drawn to approximate scale, shows a 
cartoon of a universal tree of life inferred from a phylogeny of protein domains. Time unfolds 
from bottom to top according to the age of FSF protein domains (nd) in a relative 0-1 scale and 
in geological time (billions of years, Gy) according to a molecular clock of folds. 44 The horizontal 
axis is proportional to the number of FSFs. Extant FSF repertoires are indicated for supergroups 
(superkingdoms and viruses). The FSFs that are unique to supergroups are highlighted with dif- 
ferent color shades in the phylogeny. The common ancestor of the lineages of cells and large-to- 
medium-sized DNA viruses (LUCA) and the common ancestor of cellular organisms belonging to 
superkingdoms Archaea, Bacteria and Eukarya (LUCELLA) are indicated with circles at the base 
of the universal "tree of life." The bar plots show FSFs that are unique to supergroups or that are 
shared with viruses or cells. Note the significant number of structures shared by viruses and cells. 



conserved than genetic sequences, which 
are highly variable and usually cannot 
hold deep historical evidence. 37 In addi- 
tion, the structural phylogenomic meth- 
odology (see refs. 12-14) is robust against 
many artifacts resulting from sequence- 
based phylogenetic reconstruction 37 and 
provides an appropriate model for study- 
ing viral evolution. In the study of Nasir 
et al., 32 viral FSF structures were assigned 
to genomic sequences using advanced 
hidden Markov models (HMMs) of 
structural recognition and cellular FSFs 
were downloaded directly from the 
SUPERFAMILY database. 38 ' 39 The census 
of FSF abundance was then used to build 
phylogenies describing the evolution of 
protein domains and proteomes. Figure 2 
summarizes the main results of our study. 
Remarkably, the census in itself uncovers 
already important patterns. A total of 304 
FSF domains were detected in the 56 viral 
proteomes, including 229 FSFs that were 
also present in all three cellular superking- 
doms (Archaea, Bacteria and Eukarya). 
The majority (> 50%) of these "univer- 
sal" FSFs were of ancient origin when 
they were traced on an evolutionary time- 
line obtained from phylogenies of pro- 
tein domains (Fig. 2). The most ancient 
structures were important for metabolism 
and translation, some of which are part of 
membrane proteins, suggesting a cellular 
primordial origin of viruses. The axis of 
the timeline unfolds relative time in a 0—1 
scale, from the origin of protein domains 
(nd = 0) to the present (nd = 1) (see refs. 
12—14). These ancient and universal FSFs 
are a clear molecular testament of the very 
early coexistence of primordial viruses and 
cells before cellular diversification. The 
observation that FSFs shared with viruses 
represent a significant fraction of FSFs 
in each superkingdom is also remarkable 
(Fig. 2). These patterns underscore the 
central role of viruses in protein evolu- 
tion. In addition, six virus-specific FSFs 
absent in cells make up capsids or are part 
of proteins necessary for cell attachment 
or inhibition of cellular apoptosis. These 
very few virus-specific FSFs appeared 
quite late in the timeline (at nd -0.6) and 
almost concurrently with Archaea-specific 
and Eukarya-specific FSFs, confirming 
the cell-like nature of primitive viruses. 
Without virion structures and functions, 



which are unique viral hallmarks, 40 prim- 
itive viruses had to multiply very much 
like cells. Consequently, they could not 
spread efficiently and in high numbers 
into the harsh environments of early 
Earth. Alternatively, they could have 
been also integral components of primi- 
tive cells. 26 

Origin of Modern Viruses 
from Primordial Cells 

We envision a scenario in which the 
last universal common ancestor of life 
(LUCA) (from Latin: dative plural of lux, 
f, to become visible, shine) gave birth to 
(at least) two descendants: (1) the last 
universal cellular ancestor (LUCELLA) 
(from Latin: nominative plural of lucel- 
lum, i, dim. lucrum, small gain; a succes- 
sion of small changes), and (2) the archaic 
virocell ancestor. LUCELLA was the 
ancestor of cells that evolved ribosomes 
and advanced protein biosynthesis, the 
ribocells. Its sibling, the archaic virocell 
was the ancestor of a lineage of cells that 
never unfolded ribosomal machinery and 



ultimately transformed into viral parasites 
and modern virocells (Fig. 2). 

Because viruses infect all three super- 
kingdoms of life, Forterre 23 proposed 
that virocells predated "modern cells" or 
the descendants of LUCA. 23 Our data 
advocates an expansion of this idea. Our 
timelines relate the origin of viruses to 
primitive ribosome-free cells committed 
to a reductive evolutionary path (Fig. 2). 
These cells were ancient virocells but with- 
out a capacity to produce virions (i.e., they 
lacked the reproductive feature of modern 
virocells). Our phylogenomic data sug- 
gest these virocell ancestors coexisted with 
evolving cellular lineages. Remarkably, 
there is accumulating microfossil evidence 
in 3-3.4 billion-year (Gy)-old black chert 
beds and shallow-marine siliclastic depos- 
its of cells of spheroidal and spindle-like 
shapes. 41 " 43 These microfossils are biogenic 
microstructures of two broad size ranges, 
5-25 |xm in size and -300 |xm. 42 ' 43 We 
contend that microfossil size variation 
could simply represent coexisting cellular 
lineages. Since a molecular clock of folds 
indicates that LUCELLA existed 2.9 Gy 
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ago 44 and cell size scales with genome com- 
plexity (Yafremava et al., manuscript sub- 
mitted), microfossil size variation could 
result from early reductive evolutionary 
processes acting on genomic complements 
of the primitive virocell lineages. 

Our phylognomic data also indicates 
that major capsid proteins and other 
proteins necessary for viral pathogenic- 
ity were acquired late (-1.6 Gy ago) and 
simultaneously with the appearance of 
superkingdom-specific FSFs. The late 
appearance of capsid proteins in our time- 
lines disagrees with the prevalent views 
of capsid origin. Capsids are sheltering 
"envelopes" of viral genomes, which are 
necessary for viral spread and infection. 45 
They are considered viral hallmarks 40 
that are shared by many diverse viral 
groups and are used to unite viruses into 
capsid-encoding organisms (CEOs). 27 
In contrast, ribosomes are the hallmarks 
of cells, defining cellular entities as ribo- 
some-encoding organisms (REOs). 27 We 
propose that in the absence of capsids, the 
initial viral lineage was necessarily cellular 
and one of many "laboratories" explor- 
ing alternative biochemistries for life. 26 
This ancient lineage failed to retain most 
of the translation machinery and never 
developed ribosomal protein biosynthesis, 
since ribosomal proteins and rRNA are 
absent in viruses. Its genome was prob- 
ably an integral component that was com- 
partmentalized. While many scenarios are 
possible, ancestral forms of volutin gran- 
ules (acidocalcisomes) could have hosted 
primordial virocell nuclei and could have 
acted as evolutionary primordia for the 
stabilization of capsids and for the packag- 
ing of virocell genomes. Acidocalcisomes 
are ancient versatile organelles that store 
polyphosphates, calcium and metals, have 
regulatory roles during cell division and 
are present in all three superkingdoms. 46 
Since membrane lipids are part of many 
viruses and are involved in the initial 
phases of viral infection, 47 we hypoth- 
esize that virocell membranes could have 
supported the appearance of first capsid 
proteins and could have facilitated the 
formation of "factories" in cellular hosts 
responsible for the first infectious viral 
cycles. This hypothesis implies that primi- 
tive viruses were in fact primordial cells 
with limited cytoplasmic structure but 



with a streamlined organelle-rich makeup. 
The constant reductive pressure on these 
cellular laboratories eventually led to 
secondary adaptations (i.e., parasitism) 
and the development of capsids and true 
virocells. 23 

Parasitism in Viruses: 
An Afterthought Triggered 
by Genome Reduction 

One of the main properties used to define 
modern day viruses is their parasitic nature. 
Viruses are able to infect cells and take over 
cellular machinery of the host for their 
own replication. Our data suggest that 
viral parasitism was an afterthought likely 
triggered by both gradual loss of genes in 
ancient virocells and the opportunity to 
exploit the expanding ribocell molecu- 
lar resources. 32 Thus, and in light of our 
results, current definitions of viruses must 
be revisited, as they are only applicable to 
extant viruses. Importantly, the structural 
makeup of the ancient viral lineage should 
be considered. Evolutionary timelines of 
domains uncovered a bimodal evolution- 
ary pattern; the majority of the domain 
structures in viruses appeared either very 
early or very late in evolution. Timelines 
also revealed that while the global protein 
repertoire was in permanent expansion, 
genome reduction was the earliest primary 
force shaping both the viral and cellular 
proteomes. However, loss of ancient genes 
first started in viruses and was then fol- 
lowed by losses in superkingdoms, begin- 
ning with Archaea. These reductive trends 
are compatible with patterns of evolution 
of cells described previously 12 ' 48 that are 
operating in microbial parasites and obli- 
gate parasites. Functional annotations 
also supported the view that very early 
in the timeline, viruses were functionally 
active and not much distinct from cells. 32 
The ancient viral structures served meta- 
bolic, informational and gene regulation 
functions, very much like cells. 45 With 
the introduction of reductive evolution- 
ary forces, most of the ancient structures 
were lost from viral proteomes and many 
were never adopted. This included the 
loss or lack of acquisition of advanced 
translational machinery. This explains 
why the largest viruses (mimiviruses and 
megaviruses) have retained only a partial 



encoding translation apparatus (up to 7 
out of the 20 aaRSs). 18 ' 19 This machinery 
is most likely the remnant of an advanced 
functional apparatus that was once pres- 
ent in the ancestor of these viruses. 19 
Our contention is that genome reduction 
resulted in a transition to the parasitic life- 
style later in the evolutionary timeline. We 
have previously linked parasitism in cells 
to genome reduction and the appearance 
of domain structures unique to super- 
kingdoms. 4 ' Viruses appeared to be no 
different. 32 The viral-specific structures 
acquired late in the timeline served sup- 
portive functions for viral pathogenicity. 
This included capsids, which appeared 
late and concurrently with mechanisms to 
suppress host defenses. Capsids crucially 
distinguish modern viruses from other 
mobile elements, such as plasmids, RNA 
satellites and transposons. 45 Because cap- 
sids are widespread among diverse groups 
of viruses, they are considered to be very 
ancient pre-LUCA discoveries. Our results, 
which are supported by molecular data 
and objective evolutionary bioinformatic 
approaches, indicate, however, that the 
appearance of capsid proteins postdated 
both LUCA and LUCELLA by at least 
1.3 Gy. We therefore present an alternative 
view in which the appearance of capsid 
coincides with the appearance of modern 
cells and viral adaptations to parasitism. 
From this point onwards, archaic virocells 
started to acquire additional structures 
necessary for infecting the descendants of 
LUCELLA. Evolutionary forces that were 
predominant in this later stage included 
(but are not limited to) gene duplication, 
recombination and HGT These forces 
were primarily responsible for enhancing 
the genetic repertoires of cells 50 once the 
adaptation to the parasitic mode in viruses 
was completed. 

LUCA Predated LUCELLA 

The HMM census, evolutionary timelines 
and the universal ToL support the ancient 
origin of viruses and point toward the 
existence of a new urancestral -3.4-Gy-old 
entity that was already present before a 
redefined LUCELLA. This entity, the true 
"LUCA" of (all) life descended by gradual 
change into the cellular lineage that gave 
birth to LUCELLA, the three cellular 
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superkingdoms and modern ribocells 23 
and the archaic virocell lineage that gave 
rise of virions and true virocells. 23 The 
original two main lines of descent pre- 
served distinct features, which manifest 
today in genomic makeup. Since a mod- 
ern virocell implies a transitive stage of the 
viral life cycle that includes both the cellu- 
lar host and the viral pathogen, the primi- 
tive virocell must be considered a stable 
cell. This virocell lacked the complexities 
of a viral life cycle and had not yet devel- 
oped its "virosphere" generating abilities. 

Conclusions 

Our structural phylogenomic infer- 
ences enable the proposal of a composite 
theory for evolution of giant viruses and 
viruses in general. We propose that giant 
viruses (with their DNA genomes) are 
remnants of an ancient virocell lineage 
that once coexisted with cellular lineages 
either independently or compartmental- 
ized within the primitive cells. This helps 
explain the presence of most ancient FSFs 
that were identified in both viruses and 
cells. This viral lineage suffered massive 
gene loss throughout evolutionary history. 
While we have not yet explored the ulti- 
mate cause of this reductive process, we 
conjecture that co-evolution of the ancient 
cells was instrumental for the development 
of nucleic acid repositories and modern 
genetics. Patterns of biochemical special- 
ization could have initially favored small 
and compartmentalized genomic reper- 
toires in the virocell lineage, putting in 
motion irreversible selective pressures for 
reductive evolution that were absent in the 
cellular lineage. This tendency required 
a focus on economy of resources and 
fast reproductive spread for persistence, 
which likely triggered increasingly smaller 
organismal entities, the need to adapt to 
a parasitic lifestyle and the development 
of the capsid container as strategy of ulti- 
mate persistence. This path to obligate 
parasitism mimics that of cellular para- 
sites. 49 Since HGT appeared to play only 
marginal roles very late in evolution, 32 
perhaps once the parasitic adaptation was 
completed, it is possible that other viral 
groups (such as those with RNA genomes) 
followed the same path. Under this model, 
evolution of parasitic cellular species into 



viruses may still be active. 51 Future analy- 
sis of the entire virosphere will yield sig- 
nificant insights into the evolution of all 
viruses and will test if indeed they have 
a single (monophyletic) or multiple (poly- 
phyletic) origin. 
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